The curation of large-scale, diverse datasets for robust weed detection is extremely timeconsuming and resource-intensive in practice. Generative artificial intelligence (AI) opens up opportunities for image generation to supplement real-world image acquisition and annotation efforts. However, it is not a trial task to generate high-quality, multi-class weed images that capture the nuances and variations in visual representations for enhanced weed detection. This study presents a novel investigation of advanced stable diffusion (SD) integrated with a module with image prompt capability, IP-Adapter, for weed image generation. Using the IP-Adapter-based model, two image feature encoders, CLIP (contrastive language image pre-training) and BioCLIP (a vision foundation model for biological images), were utilized to generate weed instances, which were then inserted into existing weed images. Image generation and weed detection experiments are conducted on a 10-class weed dataset captured in vegetable fields. The perceptual quality of generated images is assessed in terms of Fréchet Inception Distance (FID) and Inception Score (IS). YOLOv11 (You Only Look Once version 11) models were trained for weed detection, achieving an improved mAP@50:95 of 1.26% on average when combining inserted weed instances with real ones in training, compared to using original images alone. Both the weed dataset and software programs in this study will be made publicly available. This study offers valuable perspectives into the use of IP-adapter-based SD for generating weed images and weed detection.
Loading....